In the series we already introduced GCC, made it able to compile C programs and so on, but we didn’t solve how to build that GCC with a simpler compiler. In this post I’ll try to explain which changes must be applied to all the ecosystem to be able to do this.
The current status
I already talked about this in the past, but it’s always a good moment to remind the bootstrapping process we are immerse in. There are steps before of these, but I’m going to start in GNU Mes, which is the core of all this.
From the part that interests us, GNU Mes has a C compiler, called MesCC. This C compiler is the one we use to compile TinyCC and we use that TinyCC to compile a really old version of GCC, the 2.95, and from that we compile more recent versions until we reach the current one. From the current one we compile the world.
That’s the theory, and it’s what we currently have in the most widely supported
architectures (i386
and maybe some ARM flavour). Problems arise when you deal
with some new architecture, like the one we have to deal with: RISC-V.
RISC-V was invented recently, and the compilers did not add support for it until some years ago. GCC added support for RISC-V in the 7.5 version, as we have been discussing through this series, which needed a C++ compiler in order to be built. That’s a problem we almost solved in the previous steps, backporting the RISC-V support to a GCC that only needed a C compiler to be built.
Now, extra problems appear. Which C compiler are we going to use to build that GCC 4.6.4 that has the RISC-V support we backported?
According to the process we described, we should use GCC 2.95, but it doesn’t support RISC-V so we would need to backport the RISC-V support to that one too. That’s not cool.
Another option would be to remove the GCC 2.95 from the equation and compile the GCC 4.6.4 directly from TinyCC, if that’s possible. Making the whole process faster removing some dependencies. But this means TinyCC has to be able to compile GCC 4.6.4. We are going to try to make this one, but that requires some work we will describe today.
On the other hand, in order to be able to build all this for RISC-V, TinyCC and MesCC have to be able to target RISC-V…
Too many conditions have to be true to all this to work. But hey! Let’s go step by step.
RISC-V support in TinyCC
First, we have to make sure that TinyCC has RISC-V support, and it does. Since not a long time ago, TinyCC is able to compile, assemble and link for RISC-V, only for 64 bits.
I tested this support using a TinyCC cross-compiler and it works. If you want to try it, I have a simple Guix package for the cross compiler, and I also fixed the official Guix package for the native TinyCC, which have been broken for long.
Still, I didn’t test the RISC-V support natively, but if the cross-compiler works, chances are the native will also work, so I’m not really worried about this point.
GNU Mes compiling TinyCC
GNU Mes supports an old C standard that is simpler than the one TinyCC uses, so it uses a fork of TinyCC with some C features removed. This fork was done way before the RISC-V support was added to TinyCC and many things have changed since then.
We need to backport the TinyCC RISC-V support to Mes’s own TinyCC fork, then. Or at least do something about it.
When I first took a look into this issue, I thought it would be an easy fix, I already backported GCC, which is orders of magnitude larger than TinyCC… But it’s not that easy. TinyCC’s internal API changed quite a bit since the fork was done, and I need to review all of it in order to make it work. Also, this process includes the need to convert all the modern C that is not supported by MesCC to the older C constructs that are available on it.
It’s a lot of work, but it’s doable to a certain degree, and this might suppose a big step for the full source bootstrap process. Like what I did in GCC, it’s not going to solve everything, but it’s a huge step in the right direction.
GNU Mes supporting RISC-V
On the lower level part of the story, if we want to make all this process work for RISC-V, GNU Mes itself should be runnable on it, and able to generate binaries for it.
There have been efforts to make all this possible, and I don’t expect this support to take long to appear finally in GNU Mes. It’s just a matter of time and funding. I am aware that Jan is also interested on spending time on this, so I think we are covered on this area.
GCC compilation with TinyCC
The only point we are missing then is to be able to build the backported GCC from TinyCC, without the intermediate GCC 2.95. This a tough one to test and achieve, because the GCC compilation process is extremely complex, and we need to make quite complex packages for this process to work.
On the other hand, the work I already did, packaging my backported GCC for guix is not enough for several reasons: it was designed to work with a modern GCC toolchain, and not with TinyCC; and a cross-compiler is not the same thing as a native one.
GCC is normally compiled in stages, which are called bootstrap by the GCC
build system. I described a little bit of that process in a footnote in
past. That process is not activated in a cross-compilation
environment, which is what I used when the backend I backported was
backtested. If the bootstrap process doesn’t work, it means the
compilation process fails, so this introduces possible errors in the build
system which we were avoiding thanks to the cross-compilation trick.
I did this on purpose, of course. I just wanted a simple working environment which was letting me test the backported RISC-V backend of the compiler, but now we need to make a proper package for GCC 4.6.4, and make it work for TinyCC.
I wouldn’t mention this if I didn’t try it and failed making this package. It’s not specially difficult to make a package, or it doesn’t look like, until you get errors like:
configure: error: C compiler cannot create executables
`¯\_(ツ)_/¯`
That being said, this is not only a packaging issue. As we already mentioned, we are removing GCC 2.95 from the pipeline, so TinyCC has to be able to deal with the GCC 4.6.4 codebase directly, including the backport I did.
The easiest way to test this is to compile GCC 4.6.4 for x86_64 in my machine, with no emulation in between, so we can find the things TinyCC can’t deal with. Later we would be able to test this further in an emulated environment or directly in a RISC-V machine to make sure TinyCC can deal with the RISC-V backend, but for a first review in the GCC core, using x86_64 can be enough. It requires no weird setup, further than a working package… Ouch!
I’m not really good at this part and I’m not sure if anyone else is, but I don’t feel like spending time in trying to make this package cascade. I feel like my time is better spent on fixing stuff, or, once the package cascade is done, fixing the compatibility.
During the whole project, making Guix packages and figuring out build systems is the part where more time was spent, and it’s the one with the lowest success rate. It feels like I wasted hours trying to make the build process work for nothing.
The funny part of this is Guix is partially the one to blame here, not conforming the FHS and having this weird way to handle inputs is what makes the whole process really complex. Code has to be patched to find the libraries, scripts must be patched too, binaries are hard to find… On the good side, it’s Guix that makes this work worth the effort, and also what makes this process reproducible, once it’s done, to let everyone enjoy it.
Wait, but didn’t Mes use a TinyCC fork?
Oh yeah of course. What I forgot to mention is the step we just described, making TinyCC able to compile the backported GCC 4.6.4, is not just as simple as I mentioned. If we use upstream TinyCC to compile GCC, who is going to compile that TinyCC? We already said MesCC is not able to do that directly.
We could build that TinyCC with the TinyCC fork Mes has or make the TinyCC fork go directly for the GCC 4.6.4, but in any case there’s an obvious task to tackle: The RISC-V support must arrive the TinyCC fork before we can do anything else. And that’s where I want to focus.
This is not only about RISC-V
I have to be clear with you: I mixed two problems together and I did that on purpose.
On the one hand we have the RISC-V support related changes. And on the other hand we have the changes on the compilation pipeline: the removal of GCC 2.95.
The second part is just a consequence of the first, but it’s not only related with the RISC-V world. Once we have our compilers ready, we are going to apply the change for the whole thing. Removing a step is a really important task for many reasons but one is the obvious at this point: having a really old compiler like GCC 2.95 forces us to stay with the architectures it was able to target, or makes us add them and maintain them ourselves. It’s a huge flexibility issue for the little gain it gives: GCC 4.6.4 is already compilable from a C90 compiler.
So, this is an important milestone, not only for my part of the job but also for the whole GNU Mes and bootstrapping effort. Skipping GCC 2.95 has to be done in every architecture, and the packaging effort of that is unavoidable.
What I already did
While I was reviewing what it needed to be done, I started doing things here and there, preparing the work and making sure I was understanding the context better.
First, I realized I introduced some non-C90 constructs in the backport of GCC, because I directly copied some code from 7.5 and I removed those. This is important, because we need to be able to compile all this with TinyCC, and I don’t expect TinyCC to support modern constructs.
I packaged a TinyCC RISC-V cross compiler for the upstream project, and also for the Mes fork even thought the latter is not available yet for compilation: we need to backport the backend in order to make it work. Still, it’s important work, because it lets me start the backport easily. I’ll need to apply more changes on top of it, for sure, but at the moment I have all I need to start coding the new backend.
I spent countless hours trying to make a proper GCC package and trying to use TinyCC as the C compiler for it with no success. This is why I decided to move on and work in a more interesting and usable part: adding the RISC-V backend to the Mes fork of TinyCC.
Of course, I already started working on the RISC-V support of the TinyCC fork from Mes, and started encountering API mismatches here and there. Most of them related with some optimizations introduced after the fork, that I need to review in more detail in the upcoming weeks. I also spent some time trying to understand how TinyCC works, and it’s a very interesting approach I have to say1.
Conclusions
I’d love to tackle all these problems together and fix the whole system, but I’m just one guy coding from his couch. It’s not realistic to think I can fix everything, and trying to do so is detrimental to my mental health.
So I decided to go for the RISC-V support for the TinyCC fork we have at Mes. This would leave all the ingredients ready for someone more experienced than me to make the final recipe.
The same thing happened with the GCC backport. I didn’t really finish the job: there’s no C++ compiler working yet, but that’s not what matters. Anyone can take what I did, package it properly, which it happened to be an impossible task for me, and make it be ready. We already made a huge step.
Fighting against a wall is bad for everyone, it’s better to pick a task where you can provide something. You feel better, and the overall state of the project is improved. Achieving things is the best gasoline you can get for achieving new things.
Regarding the task I chose, I’ve already spent some hours working on it. It’s not an easy task. The internal TinyCC API changed a lot since the moment the fork was done, and there are many commits related with RISC-V since then. One of the most recent one fixes the RISC-V assembler after I reported it wasn’t working, few weeks ago. All these changes must be reviewed carefully, undoing the API changes and also, most importantly, keeping the code compatible with GNU Mes’s C compiler.
Not an easy task.
-
Maybe I’ll have the time to explain it in a future blog post, maybe not. ↩